Discovering Trends in Text Databases

نویسندگان

  • Brian Lent
  • Rakesh Agrawal
  • Ramakrishnan Srikant
چکیده

We address the problem of discovering trends in text databases. Trends can be used, for example, to discover that a company is shifting interests from one domain to another. We are given a database V of documents. Each document consists of one or more text fields and a timestamp. The unit of text is a word and a phrase is a list of words. (We defer the discussion of more complex structures till the “Methodology” secl-inn \ Ao.aw.;,tc.rl ..r;th r...rh nhrano ;a s h;rtmw nf the YAVU., ~uu”~Icu”n,L& ““lull \.uIUIA yuLCll”U I” Lo ,YYUY”~ y “I Yll” frequency of occurrence of the phrase, obtained by partitioning the documents based upon their timestamps. The frequency of occurrence in a particular time period is the number of documents that contain the phrase. (Other measures of frequency are possible, e.g. counting each occurrence of the phrase in a document.) A trend is a specific subsequence of the history of a phrase that satisfies the users’ query over the histories. For example, the user may specify a “spike” query to finds those phrases whose frequency of occurrence increased and then decreased.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A review of text mining approaches and their function in discovering and extracting a topic

Background and aim: Four text mining methods are examined and focused on understanding and identifying their properties and limitations in subject discovery. Methodology: The study is an analytical review of the literature of text mining and topic modeling.  Findings: LSA could be used to classify specific and unique topics in documents that address only a single topic. The other three text min...

متن کامل

Discovering Semantic Patterns in Bibliographically Coupled Documents

ISSUESIN DISCOVERING KNOWLEDGE IN BIBLIOGRAPHIC databases are addressed. An example of semantic pattern analysis is used to demonstrate the methodological aspects of knowledge discovery in bibliographic databases. The semantic pattern analysis is based on the keywords selected from the documents grouped by bibliographical coupling. The frequency distribution patterns suggest the existence of a ...

متن کامل

Discovering Unknown Patterns in Free Text

Copyright © 2006, Idea Group Inc., distributing in print or electronic forms without written permission of IGI is prohibited. INTRODUCTION A very large percentage of business and academic data is stored in textual format. With the exception of metadata, such as author, date, title and publisher, these data are not overtly structured like the standard, mainly numerical, data in relational databa...

متن کامل

1 Various Approaches in Text Pre - processing

Text mining, as an increasingly important field of research in Knowledge Discovery in Data (KDD), concentrates on discovering hidden patterns, rules, regularities and trends from textual data, such as natural language speech or web documents. The structure of textual data is considered implicit, which is different from the structured data that stored in databases. The various natures of textual...

متن کامل

Trends and patterns of evolution for product innovation

Perhaps the most promising TRIZ tools are trends and pattern of evolution. The idea that technological systems tend to go forward in a way analogous to that of biological systems has been supporting the research of the evolution of several products. Some degree of coincidence to this analogy has been found in several cases using statistical analysis tools in patent databases. This paper starts ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997